-
Couldn't load subscription status.
- Fork 46
Support datetime extended type #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support datetime extended type #228
Conversation
1e57e7b to
747995c
Compare
df7f4f9 to
3ac6206
Compare
c1cf8b4 to
0bee71e
Compare
3ac6206 to
989af6c
Compare
cf4ed18 to
57491fc
Compare
7181eb0 to
51ece90
Compare
57491fc to
e7aab2b
Compare
76b526b to
7d847c8
Compare
b0d1b7e to
8b25d9d
Compare
|
Class inheritance should be improved before review. |
8234495 to
d9b6748
Compare
|
@oleg-jukovec , thank you for pointing out about custom timezones. Now we encode abbreviated timezones to If user wants to create a import tarantool.msgpack_ext.types.timezones as tt_timezones
tzinfo = datetime.timezone(
datetime.timedelta(minutes=tt_timezones.timezoneAbbrevInfo['MSK']['offset']),
name='MSK'
)
dt = tarantool.Datetime(
year=2022, month=8, day=31, hour=18, minute=7, second=54,
microsecond=308543, nanosecond=321, tzinfo=tzinfo
)I'm not sure we should expose |
2f7f662 to
fc77b4c
Compare
3f02566 to
c8ae392
Compare
Timezones in tarantool-python RFCTarantool datetime may provide timezone info in two forms: It corresponds to Lua Based on Lua API, user can set up only What should we use to store timezone info? The solution should satisfy several criteria:
Let's discuss both of them. We should unambiguously be able to encode it to the same msgpack, i.e. the same What does it mean that it should be useful? It should be not just a reference info, but a real timezone. If user would want to do something with datetime info, it should behave appropriately. Again, for example, if user gets There is a It is rather inconvenient to store to decode tarantool timezones to custom tarantool.Timezone type. Since we already use This type should be useful, so it should implement standard
If we won't be able to accept any other timezones, it would be an another burden on user's shoulders. To impove his experience, we may provide some migration advices. On the other hand, converting may be provided not as expected by user. Since
Let's describe converting rules. If If If timezone has a name that is unknown to Tarantool, we raise an error. If timezone has We would not implement tzindex/tzoffset correspondence checkup. We will wait for tarantool/tarantool#7680 updates. |
Well, actually, you simply can't do it. pandas cannot work with any pandas-dev/pandas#15986 (comment) pandas source code is full of workaround to detect if it is pytz timezone and mess with its internals. |
2dc6c75 to
f95056a
Compare
|
After discussion with @oleg-jukovec , we decided to implement |
|
@oleg-jukovec , @LeonidVas , new revision had been uploaded, humbly requesting one more review iteration. |
d4e0aed to
47d9864
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. I think this is a clearer solution than previous.
| def msgpack_encode(self): | ||
| seconds = self._datetime.value // NSEC_IN_SEC | ||
| nsec = self.nsec | ||
| tzoffset = self.tzoffset | ||
|
|
||
| tz = self.tz | ||
| if tz != '': | ||
| tzindex = tt_timezones.timezoneToIndex[tz] | ||
| else: | ||
| tzindex = 0 | ||
|
|
||
| buf = get_int_as_bytes(seconds, SECONDS_SIZE_BYTES) | ||
|
|
||
| if (nsec != 0) or (tzoffset != 0) or (tzindex != 0): | ||
| buf = buf + get_int_as_bytes(nsec, NSEC_SIZE_BYTES) | ||
| buf = buf + get_int_as_bytes(tzoffset, TZOFFSET_SIZE_BYTES) | ||
| buf = buf + get_int_as_bytes(tzindex, TZINDEX_SIZE_BYTES) | ||
|
|
||
| return buf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to do decoding as a separate function, rather than a class method.
Up to you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, since it works with class internals (like self._datetime.value), it better be a part of the class. And I'm not so sure value should be exposed as a property.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it not true self._datetime.value // NSEC_IN_SEC == Int(self.timestamp) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, until there is no precision loss on conversion from float to int.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think we may say that the main responsibility of Datetime is to be encoded and decoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And decoding is already a class method (as an __init__ constructor).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I think we may say that the main responsibility of Datetime is to be encoded and decoded.
At this stage the class looks like DTO and serialization/deserialization methods is ok. But I guess (I did not look at an inteval pull request) that AddInterval/SubInterval/SubDatetime methods will be added in the future. After that, this class will no longer look like a DTO and serialization/deserialization methos will look strange.
In fact, the class does not require the decode method in the current implementation. There is no need for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AddInterval/SubInterval/SubDatetime
They are implemented as operators overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, the class does not require the decode method in the current implementation.
Well, I wouldn't be too sure about this. For example, if decode would be external, it would be bad for performance because we would need to create excessive intermediate pandas.Timestamp to transform epoch to year/month/day/etc. Moreover, as far as I remember, pandas and Tarantool treat epoch + tz with different approaches (and that's the reason decode do .replace(tzinfo=pytz.UTC).tz_convert(tzinfo) and __init__ do .replace(tzinfo=tzinfo)), so may need to add some workarounds.
47d9864 to
e71a4ac
Compare
Yeah, it definitely is. Thank you for your advises, the last version of my implementation was dissatisfying for myself too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
e71a4ac to
8c92b86
Compare
Tarantool supports datetime type since version 2.10.0 [1]. This patch
introduced the support of Tarantool datetime type in msgpack decoders
and encoders.
Tarantool datetime objects are decoded to `tarantool.Datetime` type.
`tarantool.Datetime` may be encoded to Tarantool datetime objects.
`tarantool.Datetime` stores data in a `pandas.Timestamp` object. You can
create `tarantool.Datetime` objects either from msgpack data or by using
the same API as in Tarantool:
```
dt1 = tarantool.Datetime(year=2022, month=8, day=31,
hour=18, minute=7, sec=54,
nsec=308543321)
dt2 = tarantool.Datetime(timestamp=1661969274)
dt3 = tarantool.Datetime(timestamp=1661969274, nsec=308543321)
```
`tarantool.Datetime` exposes `year`, `month`, `day`, `hour`, `minute`,
`sec`, `nsec`, `timestamp` and `value` (integer epoch time with
nanoseconds precision) properties if you need to convert
`tarantool.Datetime` to any other kind of datetime object:
```
pdt = pandas.Timestamp(year=dt.year, month=dt.month, day=dt.day,
hour=dt.hour, minute=dt.minute, second=dt.sec,
microsecond=(dt.nsec // 1000),
nanosecond=(dt.nsec % 1000))
```
`pandas.Timestamp` was chosen to store data because it could be used
to store both nanoseconds and timezone information. In-build Python
`datetime.datetime` supports microseconds at most, `numpy.datetime64` do
not support timezones.
Tarantool datetime interval type is planned to be stored in custom
type `tarantool.Interval` and we'll need a way to support arithmetic
between datetime and interval. This is the main reason we use custom
class instead of plain `pandas.Timestamp`. It is also hard to implement
Tarantool-compatible timezones with full conversion support without
custom classes.
This patch does not yet introduce the support of timezones in datetime.
1. tarantool/tarantool#5941
2. https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html
Part of #204
Support non-zero tzoffset in datetime extended type.
Use `tzoffset` parameter to set up offset timezone:
```
dt = tarantool.Datetime(year=2022, month=8, day=31,
hour=18, minute=7, sec=54,
nsec=308543321, tzoffset=180)
```
You may use `tzoffset` property to get timezone offset of a datetime
object.
Offset timezone is built with pytz.FixedOffset(). pytz module is already
a dependency of pandas, but this patch adds it as a requirement just in
case something will change in the future.
This patch doesn't yet introduce the support of named timezones
(tzindex).
Part of #204
Support non-zero tzindex in datetime extended type. If both tzoffset and
tzindex are specified, tzindex is prior (same as in Tarantool [1]).
Use `tz` parameter to set up timezone name:
```
dt = tarantool.Datetime(year=2022, month=8, day=31,
hour=18, minute=7, sec=54,
nsec=308543321, tz='Europe/Moscow')
```
You may use `tz` property to get timezone name of a datetime object.
pytz is used to build timezone info. Tarantool index to Olson name
map and inverted one are built with gen_timezones.sh script based on
tarantool/go-tarantool script [2]. All Tarantool unique and alias
timezones present in pytz.all_timezones list. Only the following
abbreviated timezones from Tarantool presents in pytz.all_timezones
(version 2022.2.1):
- CET
- EET
- EST
- GMT
- HST
- MST
- UTC
- WET
pytz does not natively support work with abbreviated timezones due to
its possibly ambiguous nature [3-5]. Tarantool itself do not support
work with ambiguous abbreviated timezones:
```
Tarantool 2.10.1-0-g482d91c66
tarantool> datetime.new({tz = 'BST'})
---
- error: 'builtin/datetime.lua:477: could not parse ''BST'' - ambiguous timezone'
...
```
If ambiguous timezone is specified, the exception is raised.
Tarantool header timezones.h [6] provides a map for all abbreviated
timezones with category info (all ambiguous timezones are marked with
TZ_AMBIGUOUS flag) and offset info. We parse this info to build
pytz.FixedOffset() timezone for each Tarantool abbreviated timezone not
supported natively by pytz.
1. https://www.tarantool.io/en/doc/latest/reference/reference_lua/datetime/new/
2. https://github.com/tarantool/go-tarantool/blob/5801dc6f5ce69db7c8bc0c0d0fe4fb6042d5ecbc/datetime/gen-timezones.sh
3. https://stackoverflow.com/questions/37109945/how-to-use-abbreviated-timezone-namepst-ist-in-pytz
4. https://stackoverflow.com/questions/27531718/datetime-timezone-conversion-using-pytz
5. https://stackoverflow.com/questions/30315485/pytz-return-olson-timezone-name-from-only-a-gmt-offset
6. https://github.com/tarantool/tarantool/9ee45289e01232b8df1413efea11db170ae3b3b4/src/lib/tzcode/timezones.h
Closes #204
8c92b86 to
aa7302a
Compare
msgpack: support datetime extended type
Tarantool supports datetime type since version 2.10.0 [1]. This patch introduced the support of Tarantool datetime type in msgpack decoders and encoders.
Tarantool datetime objects are decoded to
tarantool.Datetimetype.tarantool.Datetimemay be encoded to Tarantool datetime objects.tarantool.Datetimestores data in apandas.Timestampobject. You can createtarantool.Datetimeobjects either from msgpack data or by using the same API as in Tarantool:tarantool.Datetimeexposesyear,month,day,hour,minute,sec,nsecandtimestampproperties if you need to converttarantool.Datetimeto any other kind of datetime object:pandas.Timestampwas chosen to store data because it could be used to store both nanoseconds and timezone information. In-build Pythondatetime.datetimesupports microseconds at most,numpy.datetime64do not support timezones.Tarantool datetime interval type is planned to be stored in custom type
tarantool.Intervaland we'll need a way to support arithmetic between datetime and interval. This is the main reason we use custom class instead of plainpandas.Timestamp. It is also hard to implement Tarantool-compatible timezones with full conversion support without custom classes.This patch does not yet introduce the support of timezones in datetime.
Part of #204
msgpack: support tzoffset in datetime
Support non-zero tzoffset in datetime extended type.
Use
tzoffsetparameter to set up offset timezone:You may use
tzoffsetproperty to get timezone offset of a datetime object.Offset timezone is built with pytz.FixedOffset(). pytz module is already a dependency of pandas, but this patch adds it as a requirement just in case something will change in the future.
This patch doesn't yet introduce the support of named timezones (tzindex).
Part of #204
msgpack: support tzindex in datetime
Support non-zero tzindex in datetime extended type. If both tzoffset and tzindex are specified, tzindex is prior (same as in Tarantool [1]).
Use
tzparameter to set up timezone name:You may use
tzproperty to get timezone name of a datetime object.pytz is used to build timezone info. Tarantool index to Olson name map and inverted one are built with gen_timezones.sh script based on tarantool/go-tarantool script [2]. All Tarantool unique and alias timezones presents in pytz.all_timezones list. Only the following abbreviated timezones from Tarantool presents in pytz.all_timezones (version 2022.2.1):
pytz does not natively support work with abbreviated timezones due to its possibly ambiguous nature [3-5]. Tarantool itself do not support work with ambiguous abbreviated timezones:
If ambiguous timezone is specified, the exception is raised.
Tarantool header timezones.h [6] provides a map for all abbreviated timezones with category info (all ambiguous timezones are marked with TZ_AMBIGUOUS flag) and offset info. We parse this info to build pytz.FixedOffset() timezone for each Tarantool abbreviated timezone not supported natively by pytz.
Closes #204